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Abstract — The usual nonparametric approach to spectral 
analysis is revisited within the regularization framework. Both 
usual and windowed periodograms are obtained as the squared 
modulus of the minimizer of regularized least squares criteria. 
Then, particular attention is paid to their interpretation within 
the Bayesian statistical framework. Finally, the question of un- 
supervised hyperparameter and window selection is addressed. 
It is shown that maximum likelihood solution is both formally 
achievable and practically useful. 

Index Terms — Quadratic regularization, penalized criterion, 
spectral analysis, periodograms, windowing, zero-padding, hy- 
perparameters, window selection. 
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SPECTRAL ANALYSIS is a fundamental problem in 
signal processing. Historical papers such as [1], tutorials 
such as [2] and books such as [3,4] are evidences of the basic 
role of spectral analysis, whether parametric or not. 

Nonparametric approach has recently prompted renewed 
interest [5] (see also [6]) within the regularization framework 
and the present contribution brings a new look at these 
methods. It provides statistical principles rather than empirical 
ones in order to derive periodogram estimators. From this 
standpoint, the major contribution of the paper is twofold. 
Firstly, it proposes new coherent interpretations of existing 
periodograms and modern justification for windowing tech- 
niques. Secondly, it introduces a maximum likelihood method 
for automatic selection of the window shape. 

Jean-Francois GlOVANNELLl and Jerome IDIER are with the Laboratoire 
des Signaux et Systemes (CNRS - SUPELEC - UPS) SUPELEC, Plateau de 
Moulon, 91192 Gif-sur-Yvette Cedex, France (emaux: giova@lss.supelec.fr, 
idier@lss.supelec.fr). 



Moreover, [5] suffers from a twofold limitation. On the 
one hand, the proposed model relies on discrete frequency 
whereas the frequency is a continuous variable. On the other 
hand, restriction to separable regularization functions does not 
allow spectral smoothness to be accounted for. The present 
contribution overcomes such limitations. 

It takes advantage of a natural model in spectral analysis 
of complex discrete-time series: the sum of side by side pure 
frequencies. Two cases are investigated: 

1) the continuous frequency (CF) case which relies on an 
infinite number of pure frequencies v £ [0, 1[ with 
amplitudes a(v), a E L 2 

2) the discrete frequency (DF) one which relies on a finite 
number, say P (usually large), of equally spaced pure 
frequencies v p = p/P, with amplitudes a p . Let us note 
a = [ao, ■ • ■ , ap-i] e (D p and u = [z/ , . . . , Vp-i] € 
[0,1 [ p - 

For N complex observed samples y = [yo, . . . , yjv-i] G 
<D N , such models read 



CF: y v 



a{v)e 2l ™ n dv + b n 



DF: 



Vn 



= p-1/2 



P-l 



iirpn/ P if. 



(1) 



(2) 



(3) 



where b = [bo, . . . , 6jv— l] € C A accounts for model and 
observation uncertainties. Let us introduce WW and W NP : 

CF: W N : L 2 — ► <C N 
DF: W NP : € p — > <£ N 

the CF and DF truncated IFT so that 

CF: y = W N a + b, 
DF: y = W NP a + b. 

The current problem consists in estimating the amplitudes 
a and/or a. Thanks to the linearity of these models w.r.t. the 
amplitudes, the problem clearly falls in the class of linear 
estimation problems [7-9]. But, in practice, estimation relies 
on a finite, maybe small, number of data N. As a consequence, 
in the CF case, a continuous frequency function a lying in L 2 
must be selected from only N data. Such a problem is known 
to be ill-posed in the sense of Hadamard [8]. In the same way, 
under the DF formulation, since the amplitudes outnumber the 
available data, the problem is underdeterminate. 

This kind of problem is nowadays well identified [8, 10] 
and can be fruitfully tackled by means of the regularization 
approach. This approach rests on a compromise between 
fidelity to the data and fidelity to some prior information 
about the solution. As mentioned above, such an idea has 
already been introduced in several papers [5] but also in [11- 
14]. In the autoregressive spectral estimation problem, [11] 
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proposes to account for spectral smoothness as a function of 
autoregressive coefficients. Otherwise, high resolution spec- 
tral estimation has been addressed within the regularization 
framework, founded on the Poisson-Gaussian model [14]. The 
present paper deepens Gaussian models and is organized as 
follows. 

Section [TT] focuses on the interpretation of usual peri- 
odograms (UP) and Section [HI] deals with the interpretation 
of windowed periodograms (WP) both using penalized ap- 
proaches with quadratic regularization. Results are exposed 
in four Propositions and the corresponding Proofs are given 
in Appendix . A Bayesian interpretation is presented in 
Section [TV] while the problem of parameter estimation and 
window selection are addressed in Section [V] Finally, con- 
clusions and perspectives for future works are presented in 
Section ED 

II. Usual periodogram 

A. Continuous frequency 

The problem at stake consists in estimating a 6 L 2 given 
data y such that (0. A first possible approach is founded on 
the Least Squares (LS) criterion 



JV-l 



{y-W N a)\y-W N a) = ^ 



71=0 



Un 



1 



a(v)e 



dv 



but, since W N is one-to-many and not many-to-one, there 
exists an infinity of solutions in L 2 . Here, the preferred 
solution for raising the indetermination relies on Regularized 
Least Squares (RLS). The simplest RLS criterion is founded 
on quadratic "separable regularization": 



Cu(o) = {y-W N a)\y-W N a) + X f \a(v)\ 2 dv , 

Jo 



(4) 



where "u" stands for usual. The regularization parameter 
A balances the trade-off between confidence in the data 
and confidence in the penalization term. For any A > 0, the 
Proposition below gives the minimizer a x of (0]). 



Proposition 1 — (CF/UP). For any A > 0, the unique 
minimizer of (@) reads 



N-l 



n=0 



d\ v ) = {i + \)- 1 Y,yne^ vr ' 

Proof: See appendix lAl 



(5) 



B. Discrete frequency 

This subsection investigates the DF counterpart of the 
previous result. In the DF approach, the LS criterion reads 



(y - W NP a)\y - W NP a) 



(6) 



quadratic "separable regularization", the corresponding RLS 
criterion is 

Qu(a) = (y- W NP a)\y - W NP a) + \a^a, (7) 
with optimum given in the next Proposition. 

Proposition 2 — (DF/UP). For any A > 0, the unique 
minimizer of (0 reads 

a x = {\ + \y 1 F P y P , (8) 

where yp denotes the vector y zero-padded up to size P. 

Proof: See appendix iBl ■ 



C. Usual periodogram: concluding remarks 

In the CF cases, the squared modulus of the penalized 
solutions |a A (V)| 2 is proportional to the usual zero-padded 
periodogram. Moreover, | a. A | 2 is3 a discretized version of 
|a A (zv)| 2 over the frequency grid v. So, within the proposed 
framework, separable quadratic regularization leads to the 
usual zero-padding technique associated with the practical 
computation of periodograms. Moreover, when A tends to 
zero, the proportionality factor tends to one. It is noticeable 
that, in this case, the criteria and 01 degenerate but their 
minimizer does not: they are the solution of the constraint 
problems 



CF: min / \a(v)\ 2 dv s.t. y = W N a. 

aeL 2 J Q 



DF: min a^a 

oec p 



s.t. y = W NP a . 
i.e., solution of the noiseless problems adressed in [5,6]. 



III. Windowed periodogram 

The previous section investigates the relationships between 
the separable regularizers and the usual (non-windowed) peri- 
odograms. The present section focuses on smoothing regular- 
izers and windowed periodograms (see [15] which analyzes 
dozens of windows to compute smoothed periodograms). 

A. Continuous spectra 

This subsection generalizes the usual norm in L 2 to the 
Sobolev [16] regularizer: 

n Q (a) -- 



i Q 

q=0 



did, . 



dv . 



which can be interpreted as a measure of spectral smoothness. 
The a q are positive real coefficients and can be generalized 
to positive real functions [8]. TZq is defined onto the Sobolev 
space [16] H Q C L 2 . Note that H° = L 2 and that the usual 



but, since W NP is one-to-many and not many-to-one, there i H u g ^ ^ denotes the vector of the squared moduli of the 
also exists an infinity of solutions in <D P . According to the component of u. 
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norm invoked in subsection III-AI is the regularizer IZo with 
a = 1. 

Remark 1 — Strictly speaking, 1Zq{cl) is not a spectral 
smoothness measure, since it is not a function of \a[y)\ but a 
function of a(y), including phase. A true spectral smoothness 
measure does not depend on the phase of a(y) and does not 
yield a quadratic criterion. The same remark holds for the 
definition of spectral smoothness proposed by Kitagawa and 
Gersh [11]. 

Accounting for spectral smoothness by means of 1Zq(o) 
yields a new penalized criterion 

Q s (a) = (y- W N a)\y - W N a) + \K Q {a) , (9) 

where the index "s" stands for smoothness. 



Proposition 3 — (CF/WP). With the previous notations and 
definitions, the minimizer of (|2P reads 



N-l 



a u {v) = us n y n e~ 



(10) 



i.e., a windowed FT. The window shape is 

Wn = {1 + XSn)- 1 , (11) 

Q 

with e p = J2 a i( 2lT P) 2q far pel,. (12) 
Proof: See appendix ICl ■ 



B. Discretized spectra 

This subsection is devoted to the generalization of crite- 
rion (0 to non-separable penalization 

Q s (a) = (y - W NP a)\y - W NP a) + Xa^n a a . (13) 

Given that the sought spectrum is circular-periodic, the penal- 
ization term has to be designed under circularity constraint. 
As a consequence, n a is a circular matrix, its eigenvalues, 
denoted e p ,p S Np, can be calculated as the FT of the first 
row of n a . Moreover, without loss of generality, we assume 
that the diagonal elements of 11^ 1 are equal to one and any 
scaling factor is integrated in the parameter A. 



Proposition 4 — (DF/WP). The minimizer of (O reads 

a w = F P y, (14) 
where the y p = w p y p for p S Np and 
w p = (1 + Ae p ) _1 . 

Proof: See appendix iDl ■ 



C. Windowed periodograms: concluding remarks 

Hence, in the CF case, the squared modulus of the penalized 
solution a u is the windowed periodogram associated with 
window uj n . Moreover, the DF solution a w is a discretized 
version of a", as soon as the e n are identified with the e n . 
As a conclusion quadratic smoothing regularizes interpret 
windowed periodograms. Moreover, it is noteworthy that 
d u (v) and a w only depend on e„ and e n for n S Nat. 



Remark 2 — Empirical power. One can easily show: 



CF: 
DF: 



\a(v)\ 2 dv 



„= <\y n \ - 



(15) 



Hence, the empirical power of the estimated spectra is smaller 
than the empirical power of the observed data and equality 
holds if and only if A = 0. 

Example 1 — Zero-order penalization. The most simple 
example consists in retrieving the non-windowed case of 
section Ul-A \ and [lI-B\ Let us apply the previous Propositions\3\ 
and^with regularize rs 



CF: 
DF: 



\a(v)\ 2 dv i.e., Q — and a a = 1 , 



(16) 



a) a 



i.e., n„ = / 



p ■ 



Then, we have s n — G n — 1, the criteria (0 and M3\ respec- 
tively become (@ and and the solutions ( 17 OP and ( 1741 ) re- 
spectively become Q and (0: as expected, the non-windowed 
solutions are retrieved. A more interesting example is the one 
given below. 

Example 2 — First-order penalization. Let the penalization 
term be 



(17) 



CF: / \a\v)\ z dv, 



DF: -P 2 J2\ a k-a k -i\ 



with ap = do for notational convenience of the circularity 
assumption. Application of Propositions\3\and^\ respectively 
yields e n = 4ir 2 n 2 (CF case) and e n = (1 — cos 2im/P) (DF 
case). The corresponding windows read 



CF: uj n = (1 + ATr 2 n 2 X)- 1 , 

DF: w n = (1 + A- Acos27m/P)- 1 . 



(18) 



In the following, we refer to them as the Cauchy and the 
inverse cosine windows. Moreover, for a finer discretization 
of the spectral domain, limp^oo e n — s n and one can retrieve 
the Cauchy window as the limit of the inverse cosine window. 
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■erse cosine window 



Cauchy window 



Prior correlation 







Time 



Fig. 1 . Inverse cosine window (lhs) and Cauchy window (rhs) as a function 
of A. In both cases, A = yields a constant shape. Furthermore, for any A, 
u>o = wo = 1. Otherwise, as A increases the window shape decreases faster 
to zero and the corresponding spectrum is smoothed. 



IV. Bayesian interpretation 

This section is devoted to Bayesian interpretations of the 
penalized solutions presented in Propositions and 
Moreover, since usual non-windowed forms are particular 
cases of windowed forms, we focus on the latter. 

Since the considered criteria are quadratic, their Bayesian 
interpretations rely on Gaussian laws. Therefore, the Bayesian 
interpretations only require the characterization of means and 
correlation structures for the stochastic models at work. 

A. Discrete frequency approach 

In the DF case, i.e., in the finite dimension vector space, 
the Bayesian interpretation of the criteria and ( fT~3T > as a 
posterior Co-Log-Likelihood is a classical result [10]. Within 
this probabilistic framework, the likelihood of the parameters 
a attached to the data y is 



f{y\a) = (irr b ) 



-N 



exp — (y ■ 

n 



W NP a)\y - W NP a) 



From a statistical viewpoint, it essentially results from the 
linearity of the model 01 and from the hypothesis of a zero- 
mean, circular (in the statistical sense), stationary, white and 
Gaussian noise vector b, with variance r^. 

Moreover, in order to interpret the regularization term 
of ( fT3l >. a zero-mean, circular, correlated Gaussian prior with 
covariance R a = r a Il~ 1 is introducecd Matrix II~ 1 is the 
normalized covariance structure, i.e., all its diagonal elements 
are equal to 1, while r a stands for the prior power. So, the 
prior density reads 

f(a) = (irr a y N det II a exp ■ 



-a^H„a . 



The Bayes rule ensures the fusion of the likelihood and the 
prior into the posterior density 

f(a\y) cxexp— Q s {a) , 

n 

where Q s is given by Eq. dT~3T >. The regularization parameter 
A is clearly A = rb/r a . 

Thus, we have a Bayesian interpretation of the criterion (fT3T l 
related to windowed periodograms. Interpretation of the cri- 
terion (0 related to usual ones results from a white prior: 

2 Rigorously speaking, this is possible only if n a is invertible. 
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Fig. 2. Usual windows and the corresponding correlations. The lhs column 
shows the time window and the rhs column shows the associated correlations. 
From top to bottom: the Hamming, the Hanning, the inverse cosine and the 
triangular. 



II a = I p. Finally, interpretations of the RLS solutions (0 
and (TBI i themselves, result from the choice of the Maximum A 
Posteriori (MAP) as a punctual estimate. Moreover, thanks to 
the Gaussian character of posterior law, other basic Bayesian 
estimators such as Posterior Mean (PM) and Marginal MAP 
(MMAP), are equal to the MAP solution itself. 

B. Continuous frequency case 

1) General theory: In the CF case, the Bayesian inter- 
pretation is more subtle since it relies on continuous index 
stochastic processes. Indeed, no posterior likelihood for the 
parameter a is available. So, there is no direct posterior 
interpretation of the criteria (0 and (0, nor MAP interpre- 
tation of the estimates (0 and ( fTQb . Roughly speaking, the 
posterior law vanishes everywhere. Nevertheless, there is a 
proper Bayesian interpretation of the estimates (0 and ( fTQb 
as PM or MMAP as shown below. 

Let us introduce a zero-mean, circular (in the statistical 
sense) and Gaussian prior law [17] for a. This law is fully 
characterized by its correlation structure j a {v),v € [—1,1], 
which is entirely described by its values for v G [0, 1] thanks 
to Hermitian symmetry. Furthermore, the usual circular- 
periodicity assumption for a{ii) results in another symmetry 
property: 7 a (l/2 + v) = y a (l/2 - v) for any v G [0, 1/2]. 

By assuming j a G L2, the latter can be expanded into a 
Fourier series: 

with Fourier coefficients 7 a G £2 given by: 



TaO) 



[0,1] 



la{v)e 



— li-Kvp 



pel*. 



Let us note c a {v) — "fai^/fa the normalized correlation and 
c°a G I2 the corresponding Fourier sequence. 
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Proposition 5 — With the previous notations and prior 
choice, the posterior mean of a[v) is: 

N-l 

E[a(u)\y] = a"» = u; n y n e- 2i ™ n , (19) 



n=0 



with UJ n = [l + \ca(n)~ 



(20) 



Proof: See appendix [E] ■ 

Comparison of (fT9T> - (f20b and (TlOb- dTTb immediately gives 
the Bayesian interpretation of windowed FT as PM^|: c a (n) = 
e" 1 , i.e., identification of the Fourier coefficients of the prior 
correlation c a (v) and the FT of the discrete correlation n a . 

2) Example 3: The present subsection is devoted to a 
precise Bayesian interpretation of deterministic examples Q] 
and |2 As we will see, there is a new obstacle in the Bayesian 
interpretation of these examples because the underlying cor- 
relations do not lie in L 2 . In order to overcome this difficulty 
we first interpret the penalization of both zero-order and first- 
order derivative: 

n 2 (a) = a f \a{v)\ 2 dv + aj f |o»| 2 {v) dv . (21) 
Jo Jo 

The case of pure zero-order and pure first-order are obtained 

in sections riV-B.2.bl and IIV-B.2.cl as limit processes. 

As seen in Proposition [3] Eq. (fl2l i. the associated coef- 



ficients are: e E 



a 



An a\p , p G TL. According to 



Proposition [5] the Fourier series coefficients for ^ a (y) are 
la{v) = £p 1 - 11 is clear that j a € £ 2 , hence j a g L 2 and 



E 

pea 



1 



ao + 4ir 2 aip 2 



-2iiri>p 



, v e [0, 1] . (22) 



It is shown in Appendix IG.2.al that, with a = \fa^Joi\ and 
a' = y/aoai, ~f a {v) reads 

cosha(M - 1/2) 



G [-1,1], 



(23) 



2a' sinha/2 

and several analytic properties are straightforwardly deduced. 
In particular, -f a has a continuous derivative over [— 1, 1] — {0} 
and the slopes at v = CP and v = + are respectively 1/ai 
and — l/ot\. 7 Q is minimum at v = 1/2 and maximum at 
v = — 1, v = and v = 1. Moreover its integral from to 1 
remains constant and equals 1/ao- 

a) Markov property: The present paragraph addresses 
the Markov property of the underlying prior process a(v) [18, 
19]. This process cannot be stricto sensu a Markov chain since 
it is circular-periodic: "future" frequency and "past" frequency 
cannot be independent. However, we show the Markov prop- 
erty for the conditional process a(y) = [a(^)|a(l)]„ G [ 0i i[. It 
is shown in Appendix IG.2.bl that its correlation structure reads 

la{v)la{v') 



7aOX) = la{v-v') 

la (0) 

sinhaj/ sinha(l — v) 
a' sinh a 



(24) 



(25) 



3 Since a(u)\y is a scalar Gaussian random variable E[a(i/)|j/] is also the 
MMAP. 



for any v, v' 6 [0,1],^ v' . According to the sufficient 
factorization of the correlation function proposed in [20, p. 64], 
it turns out that a(v) is a Markov chain. 

b) Limit case as a\ — > 0: As ot\ tends to zero, it is easy 
to show that for each v e]0, 1[, the correlation 7 a (V) tends 
to zero i.e., there is no more correlation between a[v\) and 
0(1*2) as soon as v\ 7^ v 2 and (1*1,1*2) 7^ (0,1)- Moreover, 
7 a (0) and 7 a (l) tend to infinity while the integral of -f a over 
[0,1] remains 1/ao. Roughly speaking, the limit correlation 
is a Dirac distribution at v = and v = 1 with weight l/2ao 
i.e., the limit process is a circular white Gaussian noise with 
"pseudo-power" 1/ao. 

c) Limit case as ao — > 0: This case is more complex 
than the previous one since W G [0, 1], 7 a (^) tends to infinity 
as ao tends to zero. So, we propose a characterization of the 
limit process via its increments. Let v\^v 2 ^v'^v' 2 £ [0,1], 
v\ < v 2 < V\ < v' 2 ■ Let us also note the frequency increments 
t v — v 2 — v\ and t' v = v' 2 — v[ and the vector of the increments 
themselves i = [0(1*2) — 0(1^1), 0(1^4) — 0(^3)] € <D 2 - This 
vector is clearly Gaussian and zero-mean. Furthermore, it is 
shown in Appendix IG.2.cl that its covariance matrix reads 



Ri — 



1 

2oT 



(26) 



T u (l - T v ) 

2r v r' v t' v {1-t' v )_ 

It turns out that the process a(v) = a(v) — a(0) is a Brownian 
bridge [21, p.36]. 



V. Hyperparameter and window selection 

The problem of hyperparameter estimation within the regu- 
larization framework is a delicate one. It has been extensively 
studied and numerous techniques have been proposed and 
compared [22-27]. The Maximum Likelihood (ML) approach 
is often chosen associated with the Bayesian interpretation. 
In the following subsections, we address regularization pa- 
rameter estimation and automatic window selection using ML 
estimation. 

A. Hyperparameters estimation 

In our context, the ML technique consists in integrating the 
amplitudes out of the problem and maximizing the resulting 
marginal likelihood w.r.t. the hyperparameters. Thanks to the 
linear and Gaussian assumptions, the marginal law for the 
data, namely the likelihood function, is also Gaussian 

f(y ! r a, n) oc (det Ry)" 1 exp -y^Ry^y . (27) 

Moreover, the covariance structure R y can be easily derived, 
as shown in the two following sections. 

1 ) Discrete frequency marginal covariance: In the present 
case, since all random quantities are in a finite dimensional 
linear space, the covariance is clearly 

R y = r a (W NP U- 1 Wl P + XI N ) = r a S y . 

Accounting for the circular structure of the matrix II a , we 
have LT Q = F P AuFp, where An is the diagonal matrix 
of eigenvalues: e p ,p € Np. Given the property fl33l in 
Appendix [E] E y is shown to be diagonal 

S y = diag[A + e" 1 ] ,n£N». (28) 
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2) Continuous frequency marginal covariance: In the 
present case, the marginal covariance matrix R y has already 
been derived in Appendix[El Eq. d32i l. Hence, R y and S y are 
diagonal: 



1 



S y = —Ry = diag[A + £ n '],!!£ Njv ■ 



(29) 



Remark 3 — In both cases, T, y only depends on e n /e n for 
n £ INjv. Consequently the likelihood function and the ML 
parameter only depend on the N first coefficients. 

3) Maximization: The opposite of the logarithm of the 
likelihood, namely the Co-Log-Likelihood (CLL) 

CLL(r a ,X) =iVlogr a +logdetS !/ + — yts-iy, (30) 

fa 

must be minimized w.r.t. r a and A. Partial minimization is 
tractable w.r.t. r a and yields r a = y*'E y ~ 1 y/N. Substitution 
of r a in Eq. ( f30b gives: 

CLL(X) =logdetS y + JVlogy+S- 1 y. (31) 

Furthermore, since E y is a diagonal matrix 

N N I 12 



CLL(X) = ^logCA + e-^+AHog^ 



71=1 



n=l 



A + e„ 1 



A' 



io g ^ n( A+e n i ) 



N 

E 



i A + en 1 



AT ' 



in the DF case. Substitution of e ra by e n yields the CF case. 
In both cases, CLL(X) is the logarithm of the ratio of two 
degree N — 1 polynomials of the variable A, with a strictly 
positive denominator. Minimization w.r.t. A is not explicit, but 
it can be numerically performed. 

4) Simulation results: ML hyperparameter selection is il- 
lustrated for the problem of Section IIV-B.2I Computations 
have been performed on the basis of of 512 sample signals 
simulated by filtering standard Gaussian noises with the filter 
of impulse response h = [1, —2, 3, —2, 1]. Let us note a* as 
the true spectrum. 

CLL has been computed on a (ao,ai)-grid of 100 x 100 
logarithmically spaced values from 10 -10 to 10 10 . The first 
observation is that CLL is fairly regular and usually shows a 
unique minimum, located between 10 _1 and 10 1 for oq, and 
between 10~ 2 and 1 for ax- However, a few "degenerated" 
cases have been observed for which d" L or af L seem to be 
null or infinite. Let us note (a™ L , d" L ) as the CLL minimized 
and a^Ls as tne corresponding RLS periodogram. 

Since a* is known in the proposed simulation study, various 
spectral distances [30] can be computed, as functions of ao 
and ot\. L\ distance, L2 distance, the Itakura-Saito divergence 
(ISD) as well as the Itakura-Saito symmetric distance (SIS) 
have been considered. Each one provides an optimal couple 
(oS 1 ,^ 1 ), (a \a\ 2 ), (ao SD ,«in and (fig* \af s ) respec- 
tively. The corresponding spectra are respectively denoted 

A L 1 A L 2 /USD „ nr l -SIS 
OrLS> OrLS> a RLS' anQ RLS ' 

4 Efficient algorithms are available in order to maximize the likelihood, such 
as gradient based [28] or EM type [29]. They have not been implemented 
here as far as a mere feasibility study is concerned. 
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Fig. 3. Qualitative comparison. True spectra (dotted lines) and estimated 
ones (solid lines). The lhs column gives linear plots and the rhs column gives 
logarithmic plots. From top to bottonQ usual periodograms, a^LS' ^rls an< ^ 



According to our experiments, as shown in Fig. [3] a]^ s , 
a^. and the a* can be graded by smoothness and estimation 
accuracy. From the smoothest to the roughest, the following 
gradation has always been observed: SrIs, a* and 
Furthermore, a H Ls is systematically over-smoothed while a^° s 
is systematically under-smoothed. Moreover, the first one 
qualitatively approximates more precisely a* in linear scale, 
whereas the second one reproduces more accurately a* in 
a logarithmic scale and especially the two notches. This is 
due to the presence of the spectra ratio in the Itakura-Saito 
distance which emphasizes the small values of the spectra. 

Finally, to our experience and as shown in Fig. [3] the 
maximum likelihood solution establishes a relevant com- 
promise between and a^° s since it is smooth enough, 
while the two notches remain accurately described. 

Quantitative comparisons have been conducted between the 
two practicable methods (when a* is not known): the usual 
periodogram and the proposed method i.e., the RLS solution 
with automatic ML hyperparameters. The obtained results are 
reported in Table U They clearly show an improvement of 
about 40-50% for all the considered distances. 





Li 


L 2 


AIS 


SIS 


UP 


0.766 


1.14 


751 


750 


RLS + ML 


0.471 


0.567 


420 


422 


Gain 


38.5% 


50.3% 


44.1% 


43.8% 



TABLE I 

Quantitative comparison. The first line refers to the usual 
periodogram while the second one refers to the rls solution 
with ml hyperparameters. the third line gives the 
quantitative improvement. 
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B. Window selection 

It has been shown that the ML technique allows the 
estimation of the regularization parameter. The problem of 
window selection is now addressed. Let us consider a set of K 
windows i.e., K matrices for k £ Njf. Index k becomes a 
new hyperparameter as well as A, and can be jointly estimated. 
The likelihood function QTl ) is now 

CLL(\, k) = logdet (£$) + logJVy^Ej)- 1 !/ . 

Maximization w.r.t. hyperparameters can be achieved in the 
same way as above for each value of k £ Njf. The maximum 
maximorum can then be easily selected. 

Numerous simulations have been performed. They are not 
reported here since they show similar results as the previous 
ones. However, it has been observed that the triangular 
window is the most often selected among: Cauchy, inverse 
cosine, Hanning, Hamming and triangle. 

VI. Conclusion 

In this paper, the usual nonparametric approach to spectral 
analysis has been revisited within the regularization frame- 
work. We have shown that usual and windowed periodograms 
could be obtained via the minimizer of regularized least 
squares criteria. In turn, penalized quadratic criteria are inter- 
preted within the Bayesian framework, so that periodograms 
are interpreted via Bayesian estimators. The corresponding 
prior is a zero-mean Gaussian process, fully specified by 
its correlation function. Particular attention is paid to the 
connection between correlation structure and window shape. 
As regards quadratic regularization, the present study signifi- 
cantly deepens a recent contribution by Sacchi et al. [5], given 
that the latter addresses neither windowed periodograms, nor 
the continuous frequencial setting. Extension to the non- 
quadratic [31] and 2D (time-frequency) case would be of 
particular interest, and we are presently working at this issue. 

Whereas the first part of our contribution provides in- 
terpretations of pre-existing tools for spectral analysis, new 
estimation schemes are derived in the second part: unsuper- 
vised hyperparameter and window selection. It is shown that 
maximum likelihood solutions are both formally achievable 
and practically useful. 

Appendix 

A. Proof of Proposition Q] 

Several proofs are available and the proposed one relies on 
variational principles [32]. Application of these principles to 
quadratic regularization of linear problem yields the functional 
equation [8]: 

-2W*(y- W N a) + 2A/ L2 a = 0, 

where I L i stands for the identity application from L 2 onto 
itself and W N stands for the adjoint application of W N (see 
Appendix IG.lt . After elementary algebra we find: 

(WlW N + \lL*)a = Wly. 



As shown in Appendix IG.ll W N Wt = In, then taking the 
FT and next the IFT gives: 

JV-l 

a» = (1 + Xr'wly = (1 + A)" 1 ]T y n ^ vn . 

n=0 

B. Proof of Proposition [2] 

The minimizer of the RLS criterion (0 obviously is 

a x = (W^W^ + XI^W^y. 

One can refer to Appendix IF. 3 1 for a detailed calculus required 
to analyze the normal matrix (W^ P W NP + XI P ). Wi, P W NP 
and I P are circulant matrices, this property also holds for their 
sum which hence is diagonal in the Fourier basis. Elementary 
algebra leads to d A 

(1 + A) _1 /jv On,p-n 

Op-N,N X~ Ip-JST 



= F P 



In 

Op-N,N 



y 



C. Proof of Proposition \3\ 

The proof is founded on a time domain version of the 
criterion (0, resulting from application of the Plancherel- 
Parseval theorem to the successive derivatives of a: 



d q a 
dvi 



dv = (27m) 



where z n = L a(v)e 2l7run dv. Summation w.r.t. q and inver- 
sion of summation w.r.t. q and w.r.t. n, gives 

R Q( a ) = E e„|z„| 2 , 

where the weighting coefficients e p fulfill ( fl2b . Hence, the 
time domain counterpart of criterion (01 reads: 

Q s (a) = (y - z)t( y - z) + A £ e n \z n \ 2 . 

Thanks to separability, the solution is easily derived: z% = 
(1 + Ae„) _1 y„ if n £ Njv and z% — elsewhere. a u is the 
Fourier transform of the sequence {z%} ne % 



N-l 



71=0 



D. Proof of Proposition 

Elementary linear algebra provides the minimizer of [T3l 

a u - {Wl P W NP + An a )- x Wt p y . 

Accounting for its circular structure, the Fourier basis diago- 
nalizes n a : 

n a = F P K u Fl , 

where An is the diagonal matrix of the eigenvalues 
e , . . . , &p-\ of n a . Hence, 

a" = F P {I P + XA n )y P , 
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and we easily find 



a w = F P y . 



with y p = uipjjp for p S Np, i.e., the data vector windowed 
by 

cj n = (1 + Ae n ) 1 . 

£. Proof of Proposition \5\ 

Let i^o €E [0, 1] and ao = a(vo). Thanks to the linearity of 
the model (01 and thanks to the Gaussian assumption for a and 
b, the joint law of (ao, y) is also Gaussian. Hence, the random 
variable (ao | y) is clearly Gaussian and it is well-known that 
its mean reads 

E[a \y] = Ra yR y 1 y, 

where R aoy — E[aoy^] and R y = E[yy^]. Elementary 
algebra and independence of a and b yield 



R u 



E \a(v )a(v)*\ e ~ 2l7!vn dv + E \a(v )b 



ia(n)e 



Moreover, under the previously mentioned assumptions, the 
generic entry R mn for R y is R mn = E [y m y*] 

r E [(a(v)a(v')*] e *™(™-»'™) d v' dv + r b 5 n - m 

= (j a (n) + r b ) 6 n - m , (32) 

where 5 n stands for the Kronecker sequence. Therefore, R y 
is a diagonal matrix with elements (7 a («) + r b ). Hence 



Af-l 



«0 



= ^[l + Ac° Q (n) x ] 1 y n e 



with A = r b /r a . 

The present Appendix collects several useful properties of 
Fourier operators. In particular, special attention is paid to 
W NP and W N . Some of the stated properties are classical. 
We have reported them in order to make our notations and 
normalization conventions explicit. The other properties are 
less usual, but all of them have straightforward proofs. 

F. Discrete case 

1) Structure of F P : In the case of N = P, the matrix W NP 
identifies with the square matrix F P , where F P performs the 
discrete FT for vectors of size P. We have the well-known 
orthogonality relations F P F P = F P F P = Ip and F P = F P . 

2) Structure ofW^p'- The matrix W NP evaluates the FT 
on a discrete grid of P points for sequences of N points, 
P ^ N. Straightforward expansion of the product provides: 



W NP F P = [ I N O n ,p-n ] 



As a consequence, we obtain 

Wl P y = F, 



In 

Op-N,N 



V = F P y P . 



(33) 



(34) 



where yp is the zero-padded version of y, up to length P. 



3) Structure of Wt, P W NP : The matrix W NP Wl, P has a 
very simple structure since, for P ^ N: W NP Wl, P = In- 
Otherwise, W}, P W NP is a non-negative, Hermitian, P x P 
circulant matrix. Circularity results from digonalization in the 
Fourier basis F P : 

Wl P W NP = F p KFI , 

and, from Eq. (1331 : 



A 



In On,p-n 

Op-N,N Op-N,P-N 



As a consequence, W}i P W NP has only two eigenvalues, 1 and 
0, of respective order N and P — N. Such a structure is useful 
in the proof of Propositions (O and in Appendix . 

G. Continuous case 

1) The W N operator: The linear application W N : a G 
L 2 — > z e <D N is defined by z n = /„* a[y)e 2ilxvn dv for 
n e Nat. The adjoint operator wl : z S — > a = Wtz 
is the linear operator such that: 

VoeL^VzeC" (W N a,z) aN = (a,Wlz} L 2, 

where (-, and (•, ■) L 2 stand for the standard inner product 
in (D and L 2 , respectively. It is given by: 



N-l 



This can be justified as follows: by inverting the order of the 
finite sum 1 and the definite integral J Q , we get 



(>V Jv a,2;) c iv 



,1 N-l 

/ «mE- 

Jo 



Finally, elementary algebra shows that the composed ap- 
plication W N W\i is the identity application from <C N onto 
itself. 

2) Technical results for the Example in \IV-B.2~\ 

a) The Fourier series fi22[ : The proof of d22l) consists 
of three steps. The first one relies on the Fourier relationship 
between Cauchy and Laplace functions 



2(3 



(3 2 + AttH 2 



-P\f\ P -2jntf 



df , ten 



The second step is founded on discrete time t = n E 7L 
and expansion in a series of integrals: 



2/3 



P 2 + An 2 n 2 



,-0\f\ p -2jwnf 



df 



E 

fl 



E 



e -PW-p\ e - 2 ^ nv dv. 



GIOVANNELLI AND IDIER: BAYESIAN INTERPRETATION OF PERIODOGRAMS 



since the invoked series are convergent. The last step is a 
simple geometric series calculus: 

cosh/3(i/ - 1/2) 



e -0\v-p\ 

pew, 



sinh/3/2 



,ve [0,1] 



easily obtained by rewriting the series as the sum of a series 
for p £ {i.e., p ^ v) and a series for p £ TL* + (i.e., p ^ v). 

b) Conditional process: Let us note v, v' £ [0,1], v > 
v' . The partitioned vector a = \a{v), a(v'), a(l)]' = [a|ai]* 
is clearly a zero-mean Gaussian vector with covariance 



Ra 



la (0) 
la {V - V 1 

laky) 



la{v - V 1 
la (0) 

ia{y') 



7a 0) 

ia{y') 

la (0) 



According to the conditional covariance matrix formula, 



\ai 



Ra 



Rda 1 Ra 1 R&ai 



we immediately get d24i i. 
Accounting for the explicit expression for 7 a (i / ) given by d23l l. 
simple expansion of hyperbolic functions yields (f25t . 

c) Law of increments: We have V\,v%,v l ^v l 2 £ [0,1], 
v\ < v 2 < V\ < v' 2 . Let us introduce the collection of the 
four values a — [<z(z/i), a(v 2 ), o,(u[), a(v 2 )} which is clearly 
a zero-mean and Gaussian vector with covariance R a . The 
increment vector % = [a(v 2 ] — 0,(1/1), a { v 2) ~ a { v i)] G ^ 2 is 
a linear transform of the vector a: i = Ha with increment 
covariance Rj 



H = 



-1 




1 

-1 



Ri — HRaH* = 



P 



with n = 2( 7o (0)- 7a ( I / 2 -i/i)), r'i = 2( 7tt (0)-7aK-K)). 

and p = ~fa(v2-V 2 )+la(vi-v'l)-la(vi-V2)~la(v2-v'i)- 

Finally, Taylor development at cxq = yields = [y 2 — 

Vl ){\ - (ya - vi))/2at, r[ = (i/ 2 - ^)(1 - (y< 2 - K))/2ai, 
and p = (v 2 — v\){y' 2 — v' x )/otx, and proves d26l >. 

Acknowledgement 

First author is particularly thankful to Alain, Naomi, 
Philippe and Denise for committed support and coaching. 

References 

[1] E. R. Robinson, "A historical perspective of spectrum estimation", Proc. 

IEEE, vol. 9, no. 9, pp. 885-907, September 1982. 
[2] S. M. Kay and S. L. Marple, "Spectrum analysis - a modern 

perpective", Proc. IEEE, vol. 69, no. 11, pp. 1380-1419, November 

1981. 

[3] S. L. Marple, Digital Spectral Analysis with Applications, Prentice-Hall, 

Englewood Cliffs, NJ, 1987. 
[4] S. M. Kay, Modern Spectral Estimation, Prentice-Hall, Englewood 

Cliffs, NJ, 1988. 

[5] M. D. Sacchi, T. J. Ulrych, and C. J. Walker, "Interpolation and 
extrapolation using a high-resolution discrete Fourier transform", IEEE 
Trans. Signal Processing, vol. 46, no. 1, pp. 31-38, January 1998. 

[6] M. D. Sacchi and T. J. Ulrych, "Estimation of the discrete Fourier 
transform, a linear inversion approach", Geophysics, vol. 61, no. 4, pp. 
1128-1136, 1996. 

[7] H. W. Sorenson, Parameter estimation, vol. 9 of Control and system 
theory, Marcel Dekker, New York Basel, 1980. 

[8] A. Tikhonov and V. Arsenin, Solutions of Ill-Posed Problems, Winston, 
Washington, DC, 1977. 

[9] M. Z. Nashed and G. Wahba, "Generalized inverses in reproducing ker- 
nel spaces: An approach to regularization of linear operators equations", 
SIAM J. Math. Anal, vol. 5, pp. 974-987, 1974. 



[10] G. Demoment, "Image reconstruction and restoration: Overview of 
common estimation structure and problems", IEEE Trans. Acoust. 
Speech, Signal Processing, vol. ASSP-37, no. 12, pp. 2024-2036, 
December 1989. 

[11] G. Kitagawa and W. Gersch, "A smoothness priors long AR model 
method for spectral estimation", IEEE Trans. Automat. Contr, vol. 
AC-30, no. 1, pp. 57-65, January 1985. 

[12] G. Wahba, "Automatic smoothing of the log periodogram", /, of the 
American Statistical Association, Theory and Methods Section, vol. 75, 
no. 369, pp. 122-132, March 1980. 

[13] G. L. Bretthorst, Bayesian Spectrum Analysis and Parameter Estima- 
tion, vol. 48 of Lecture Notes in Statistics, J. Berger, S.Fienberg, J. 
Gani, K. Krickeberg, and B. Singer, Springer- Verlag edition, 1988. 

[14] F. Dublanchet, J. Idier, and P. Duvaut, "Direction-of-arrival and 
frequency estimation using Poisson-Gaussian modeling", in Proc. IEEE 
ICASSP, Munich, Germany, April 1997, pp. 3501-3504. 

[15] F. J. Harris, "On the use of windows for harmonic analysis with the 
discrete Fourier transform", Proc. IEEE, vol. 66, no. 1, pp. 51-83, 
January 1978. 

[16] A. Bertin, Espaces de Hilbert, Service d'Edition de l'ENSTA, 1993. 
[17] H. Cramer and M. R. Leadbetter, Stationary and Related Stochastic 

Processes, John Wiley, New York, London, Sydney, 1967. 
[18] P. Bremaud, Markov Chains. Gihbs fields, Monte Carlo Simulation, 

and Queues, Texts in Applied Mathematics 31. Spinger, New York, 

NY, 1999. 

[19] J. M. F. Moura and G. Sauraj, "Gauss-Markov random fields (GMrf) 

with continuous indices", IEEE Trans. Inf. Theory, vol. 43, no. 5, pp. 

1560-1573, September 1997. 
[20] E. Wong, Stochastic Processes in Information and Dynamical Systems, 

Series in Systems Science. McGraw-Hill Book Company, New York, 

NY, 1971. 

[21] R. N. Bhattacharya and E. C. Waymire, Stochastic Processes with 

Applications, John Willay & Sons, Inc., New York, NY, 1990. 
[22] G. H. Golub, M. Heath, and G. Wahba, "Generalized cross-validation 

as a method for choosing a good ridge parameter", Technometrics, vol. 

21, no. 2, pp. 215-223, May 1979. 
[23] D. M. Titterington, "Common structure of smoothing techniques in 

statistics", Int. Statist. Rev., vol. 53, no. 2, pp. 141-170, 1985. 
[24] P. Hall and D. M. Titterington, "Common structure of techniques for 

choosing smoothing parameter in regression problems", J. R. Statist. 

Soc. B, vol. 49, no. 2, pp. 184-198, 1987. 
[25] A. Thompson, J. C. Brown, J. W. Kay, and D. M. Titterington, "A study 

of methods of choosing the smoothing parameter in image restoration 

by regularization", IEEE Trans. Pattern Anal. Much. Intel!., vol. PAMI- 

13, no. 4, pp. 326-339, April 1991. 
[26] N. Fortier, G. Demoment, and Y. Goussard, "Comparison of GCV 

and ML methods of determining parameters in image restoration by 

regularisation", /. Visual Comm. Image Repres., vol. 4, pp. 157-170, 

1993. 

[27] J.-F. Giovannelli, G. Demoment, and A. Herment, "A Bayesian method 

for long AR spectral estimation: a comparative study", IEEE Trans. 

Ultrasonics Ferroelectrics and Frequency Control, vol. 43, no. 2, pp. 

220-233, March 1996. 
[28] D. P. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, 

MA, 1995. 

[29] R. Shumway and D. Stoffer, "An approach to time series smoothing 
and forecasting using the EM algorithm", /, Time Series Analysis, pp. 
253-264, 1982. 

[30] M. Basseville, "Distance measures for signal processing and pattern 
recognition", Signal Processing, vol. 18, no. 4, pp. 349-369, December 
1989. 

[31] P. Ciuciu, J. Idier, and J.-F. Giovannelli, "Markovian high resolution 
spectral analysis", in Proc. IEEE ICASSP, Phoenix, AZ, March 1999, 
pp. 1601-1604. 

[32] D. G. Luenberger, Optimization by Vector Space Methods, John Wiley, 
New York, NY, 1st edition, 1969. 



